Callisto-RTS: Fine-Grain Parallel Loops
نویسندگان
چکیده
We introduce Callisto-RTS, a parallel runtime system designed for multi-socket shared-memory machines. It supports very fine-grained scheduling of parallel loops— down to batches of work of around 1K cycles. Finegrained scheduling helps avoid load imbalance while reducing the need for tuning workloads to particular machines or inputs. We use per-core iteration counts to distribute work initially, and a new asynchronous request combining technique for when threads require more work. We present results using graph analytics algorithms on a 2-socket Intel 64 machine (32 h/w contexts), and on an 8-socket SPARC machine (1024 h/w contexts). In addition to reducing the need for tuning, on the SPARC machines we improve absolute performance by up to 39% (compared with OpenMP). On both architectures Callisto-RTS provides improved scaling and performance compared with a state-of-the-art parallel runtime system (Galois).
منابع مشابه
Eigenvectors-based parallelisation of nested loops with affine dependences
Abstract This paper presents a method for parallelising nested loops with affine dependences. The data dependences of a program are represented exactly using a dependence matrix rather than an imprecise dependence abstraction. By a careful analysis of the eigenvectors and eigenvalues of the dependence matrix, we detect the parallelism inherent in the program, partition the iteration space of th...
متن کاملDistributed Filaments: Eecient Fine-grain Parallelism on a Cluster of Workstations Distributed Filaments: Eecient Fine-grain Parallelism on a Cluster of Workstations
A ne-grain parallel program is one in which processes are typically small, ranging from a few to a few hundred instructions. Fine-grain parallelism arises naturally in many situations, such as iterative grid computations, recursive fork/join programs, the bodies of parallel FOR loops, and the implicit parallelism in functional or dataaow languages. It is useful both to describe massively parall...
متن کاملSelective inline expansion for improvement of multi grain parallelism
This paper proposes a selective procedure inlining scheme to improve a multi-grain parallelism, which hierarchically exploits the coarse grain task parallelism among loops, subroutines and basic blocks and near fine grain parallelism among statements inside a basic block in addition to the loop parallelism. Using the proposed scheme, the parallelism among different layers(nested levels) can be ...
متن کاملDistributed Filaments: E cient Fine-Grain Parallelism on a Cluster of Workstations
A ne-grain parallel program is one in which processes are typically small, ranging from a few to a few hundred instructions. Fine-grain parallelism arises naturally in many situations, such as iterative grid computations , recursive fork/join programs, the bodies of parallel FOR loops, and the implicit parallelism in functional or dataaow languages. It is useful both to describe massively paral...
متن کاملCache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding
The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop...
متن کامل